We show that it is possible to predict which deep network has generated a given logit vector with accuracy well above chance. We utilize a number of networks on a dataset, initialized with random weights or pretrained weights, as well as fine-tuned networks. A classifier is then trained on the logit vectors of the trained set of this dataset to map the logit vector to the network index that has generated it. The classifier is then evaluated on the test set of the dataset. Results are better with randomly initialized networks, but also generalize to pretrained networks as well as fine-tuned ones. Classification accuracy is higher using unnormalized logits than normalized ones. We find that there is little transfer when applying a classifier to the same networks but with different sets of weights. In addition to help better understand deep networks and the way they encode uncertainty, we anticipate our finding to be useful in some applications (e.g. tailoring an adversarial attack for a certain type of network). Code is available at https://github.com/aliborji/logits.
translated by 谷歌翻译
我在本文中的目标是双重的:研究深层模型能够理解Dall-E 2和Midjourney产生的图像,并定量评估这些生成模型。收集了两组生成的图像,以进行对象识别和视觉问题答案(VQA)任务。根据对象识别,最佳模型(在10个最先进的对象识别模型中,分别达到60 \%和80 \%top-1和top-5的精度。这些数字远低于Imagenet数据集(91 \%和99 \%)的最佳准确度。在VQA上,OFA模型在回答50张图像的241个二进制问题时得分为77.3 \%。该模型在二进制VQA-V2数据集上得分为94.7 \%。人类能够识别生成的图像并轻松回答它们的问题。我们得出的结论是,a)深层模型难以理解生成的内容,并且在微调后可能会做得更好,b)生成的图像和真实照片之间存在很大的分布变化。分配转移似乎是类别依赖性的。数据可在以下网址获得:https://drive.google.com/file/d/1n2nciaxtyjrrf2r73-lne3zggeu_heh0/view?usp = sharing。
translated by 谷歌翻译
几乎所有对抗性攻击都是为了欺骗模型而添加不可察觉的扰动。在这里,我们认为相反的是可以欺骗人类而不是模型的对抗性例子。添加了足够大且明显的扰动,以使模型保持其原始决定,而人类很可能会被迫决定(或选择完全决定),很可能会犯错。现有的目标攻击可以重新重新构成此类对抗性例子。我们提出的攻击被称为NKE,本质上与欺骗图像相似,但更有效,因为它使用梯度下降而不是进化算法。它还为对抗脆弱性问题提供了一种新的统一观点。MNIST和CIFAR-10数据集的实验结果表明,我们的攻击在欺骗深层神经网络方面非常有效。代码可在https://github.com/aliborji/nke上找到。
translated by 谷歌翻译
简短答案:是的,长答案:不!实际上,对对抗性鲁棒性的研究导致了宝贵的见解,帮助我们理解和探索问题的不同方面。在过去的几年中,已经提出了许多攻击和防御。然而,这个问题在很大程度上尚未解决和理解不足。在这里,我认为该问题的当前表述实现了短期目标,需要修改以实现更大的收益。具体而言,扰动的界限创造了一个人为的设置,需要放松。这使我们误导了我们专注于不够表达的模型类。取而代之的是,受到人类视野的启发以及我们更多地依赖于形状,顶点和前景对象的功能,而不是纹理等非稳定功能,应努力寻求显着不同的模型类别。也许我们应该攻击一个更普遍的问题,而不是缩小不可察觉的对抗性扰动,该问题是找到与可感知的扰动,几何变换(例如旋转,缩放),图像失真(照明,模糊)等同时稳健的体系结构,等等。阻塞,阴影)。只有这样,我们才能解决对抗脆弱性的问题。
translated by 谷歌翻译
我们提出了SplitMixer,这是一种简单且轻巧的各向同性MLP型体系结构,用于视觉识别。它包含两种类型的交织卷积操作,以在空间位置(空间混合)和通道(通道混合)之间混合信息。第一个包括依次应用两个深度的1D内核,而不是2D内核来混合空间信息。第二个是将通道分解为有或没有共享参数的重叠或非重叠段,并应用我们建议的通道混合方法或3D卷积以混合通道信息。根据设计选择,可以构建许多拆分变体,以平衡准确性,参数数量和速度。我们在理论上和实验上都表明,SplitMixer在最先进的MLP样模型上表现出色,同时具有明显较低的参数和拖船。例如,如果没有强大的数据增强和优化,SplitMixer仅用0.28亿参数就可以在CIFAR-10上实现94%的精度,而Convmixer则使用约0.6m的参数实现了相同的精度。众所周知的MLP混合仪以1710万参数实现了85.45%。在CIFAR-100数据集上,SplitMixer的准确性约为73%,与Convmixer相当,但参数和拖鞋少约52%。我们希望我们的结果能够激发进一步的研究,以寻找更有效的视力体系结构,并促进类似MLP的模型的发展。代码可在https://github.com/aliborji/splitmixer上找到。
translated by 谷歌翻译
近十年来,可可数据集一直是对象检测中研究床的中央测试床。但是,根据最近的基准测试,该数据集上的性能似乎已经开始饱和。一个可能的原因可能是,也许它不足以训练深层模型。为了解决此限制,我们在这里向可可介绍了两个互补数据集:i)可可_oi,由可可和露天图像(来自其80个共同的80个类别)的图像组成,并带有1,418,978培训框,超过380,111张图像,以及41,893验证框架的18,299图像, ,和ii)objectnet_d在日常生活情况下包含对象(最初是为被称为objectNet的对象识别而创建的;与可可共有29个类别)。后者可用于测试对象检测器的概括能力。我们在这些数据集上评估了一些模型,并查明错误的源头。我们鼓励社区利用这些数据集进行培训和测试对象检测模型。代码和数据可在https://github.com/aliborji/coco_oi上获得。
translated by 谷歌翻译
对象检测是一项基本视觉任务。它在学术界进行了高度研究,并在行业中广泛采用。平均精度(AP)是评估对象检测器的标准分数。但是,我们对该分数的微妙之处的理解是有限的。在这里,我们量化了AP对边界框扰动的敏感性,并表明AP对小型翻译非常敏感。只有一个像素移位足以将模型的地图降低8.4%。仅一个像素偏移的小物体上的地图掉落为23.1%。当使用地面真相(GT)框为预测时,相应的数字分别为23%和41.7%。这些结果解释了为什么随着模型变得更好,为什么实现更高的地图变得越来越困难。我们还研究了盒子缩放对AP的影响。代码和数据可从https://github.com/aliborji/ap_box_perturbation获得。
translated by 谷歌翻译
人类严重依赖于形状信息来识别对象。相反,卷积神经网络(CNNS)偏向于纹理。这也许是CNNS易受对抗性示例的影响的主要原因。在这里,我们探索如何将偏差纳入CNN,以提高其鲁棒性。提出了两种算法,基于边缘不变,以中等难以察觉的扰动。在第一个中,分类器在具有边缘图作为附加信道的图像上进行前列地培训。在推断时间,边缘映射被重新计算并连接到图像。在第二算法中,训练了条件GaN,以将边缘映射从干净和/或扰动图像转换为清洁图像。推断在与输入的边缘图对应的生成图像上完成。超过10个数据集的广泛实验证明了算法对FGSM和$ \ ELL_ infty $ PGD-40攻击的有效性。此外,我们表明a)边缘信息还可以使其他对抗训练方法有益,并且B)在边缘增强输入上培训的CNNS对抗自然图像损坏,例如运动模糊,脉冲噪声和JPEG压缩,而不是仅培训的CNNS RGB图像。从更广泛的角度来看,我们的研究表明,CNN不会充分占对鲁棒性至关重要的图像结构。代码可用:〜\ url {https://github.com/aliborji/shapedefense.git}。
translated by 谷歌翻译
Foreground map evaluation is crucial for gauging the progress of object segmentation algorithms, in particular in the field of salient object detection where the purpose is to accurately detect and segment the most salient object in a scene. Several widely-used measures such as Area Under the Curve (AUC), Average Precision (AP) and the recently proposed F ω β (Fbw) have been used to evaluate the similarity between a non-binary saliency map (SM) and a ground-truth (GT) map. These measures are based on pixel-wise errors and often ignore the structural similarities. Behavioral vision studies, however, have shown that the human visual system is highly sensitive to structures in scenes. Here, we propose a novel, efficient, and easy to calculate measure known as structural similarity measure (Structure-measure) to evaluate non-binary foreground maps. Our new measure simultaneously evaluates region-aware and object-aware structural similarity between a SM and a GT map. We demonstrate superiority of our measure over existing ones using 5 meta-measures on 5 benchmark datasets.
translated by 谷歌翻译
Recent progress on salient object detection is substantial, benefiting mostly from the explosive development of Convolutional Neural Networks (CNNs). Semantic segmentation and salient object detection algorithms developed lately have been mostly based on Fully Convolutional Neural Networks (FCNs). There is still a large room for improvement over the generic FCN models that do not explicitly deal with the scale-space problem. Holistically-Nested Edge Detector (HED) provides a skip-layer structure with deep supervision for edge and boundary detection, but the performance gain of HED on saliency detection is not obvious. In this paper, we propose a new salient object detection method by introducing short connections to the skip-layer structures within the HED architecture. Our framework takes full advantage of multi-level and multi-scale features extracted from FCNs, providing more advanced representations at each layer, a property that is critically needed to perform segment detection. Our method produces state-of-theart results on 5 widely tested salient object detection benchmarks, with advantages in terms of efficiency (0.08 seconds per image), effectiveness, and simplicity over the existing algorithms. Beyond that, we conduct an exhaustive analysis on the role of training data on performance. Our experimental results provide a more reasonable and powerful training set for future research and fair comparisons.
translated by 谷歌翻译